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Genetic information contains a record of the history of 
our species, and technological advances have trans- 
formed our ability to access this record. Many studies 
have used genome-wide data from populations today to 
learn about the peopling of the globe and subsequent 
adaptation to local conditions. Implicit in this research is 
the assumption that the geographic locations of people 
today are informative about the geographic locations of 
their ancestors in the distant past. However, it is now 
clear that long-range migration, admixture, and popula- 
tion replacement subsequent to the initial out-of-Africa 
expansion have altered the genetic structure of most of 
the world's human populations. In light of this we argue 
that it is time to critically reevaluate current models of 
the peopling of the globe, as well as the importance of 
natural selection in determining the geographic distri- 
bution of phenotypes. We specifically highlight the 
transformative potential of ancient DIMA. By accessing 
the genetic make-up of populations living at archaeolog- 
ically known times and places, ancient DNA makes it 
possible to directly track migrations and responses to 
natural selection. 

Introduction 

Within the past 100 000 years anatomically modern 
humans have expanded to occupy every habitable area 
of the globe. The history of this expansion has been ex- 
plored with tools from several disciplines including linguis- 
tics, archaeology, physical anthropology, and genetics. 
These disciplines all can be used to ask the same question: 
how did we get to where we are today? 

Attempts to answer this question have often taken the 
form of a dialectic between two hypotheses. On the one 
hand are arguments in favor of demographic stasis, which 
propose that the inhabitants of a region are the descen- 
dants of the first people to arrive there. On the other side 
are the arguments in favor of rapid demographic change, 
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which propose that the present-day inhabitants of a region 
descend from people who arrived during periods of tech- 
nological or cultural change, replacing the previous inha- 
bitants. 

In archaeology, this debate has played out around the 
issue of whether sudden changes in material culture ap- 
parent in the archaeological record can be attributed to the 
spread of culture or to population movements: ‘pots versus 
people’ [1] . In physical anthropology, the debate has played 
out around the issue of whether changes in morphological 
characters over time are due to in situ evolution or to the 
arrival of new populations (e.g., [2]). 

The same debate has also played out in genetics. On the 
side of population replacements, there are the ‘wave of 
advance’ and ‘demic diffusion’ models, first proposed to 
describe the spread of agriculture through Europe. In these 
models, the Neolithic transition was accompanied by the 
spread of farmers from the Near East across Europe, who 
partially or completely replaced resident hunter-gatherers 
[3-6]. On the side of stasis, there are the ‘serial founder 
effect’ models [7,8], which proposed that populations have 
remained in the locations they first colonized after the out- 
of-Africa expansion, exchanging migrants only at a low 
rate with their immediate neighbors until the long-range 
migrations of the past 500 years [9-12], 

These genetic models - the wave-of-advance models on 
the one hand, and the serial founder effect models on the 
other - were proposed before the availability of large-scale 
genomic data. The great synthesis of genetic data with 
historical, archaeological and linguistic information, ‘The 
History and Geography of Human Genes’ [13], was pub- 
lished in 1994 based on data from around 100 protein 


Glossary 

Admixture: a sudden increase in gene flow between two differentiated 
populations. 

Bottleneck: a temporary decrease in population size in the history of a 
population. 

Gene flow: the exchange of genes between two populations as a result of 
interbreeding. 

Heterozygosity: the number of differences between two random copies of a 
genome in a population. 
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polymorphisms, and the papers that popularized the no- 
tion of a serial founder effect model were written based on 
data from around 1000 microsatellites. However, it is now 
possible to genotype millions of polymorphisms in thou- 
sands of individuals using high-throughput sequencing. 
Because of these technological advances, the past few years 
have seen a dramatic increase in the quantity of data 
available for learning about human history. Equally im- 
portant has been rapid innovation in methods for making 
inferences from these data. We argue here that the tech- 
nological breakthroughs of the past few years motivate a 
systematic reevaluation of human history using modern 
genomic tools - a new ‘History and Geography of Human 
Genes’ that exploits many orders of magnitude more data 
than the original synthesis. 

In the first section of the paper we summarize what we 
see as major lessons from the recent literature. In particu- 
lar, it is now clear that the data contradict any model in 
which the genetic structure of the world today is approxi- 
mately the same as it was immediately following the out- 
of-Africa expansion. Instead, the past 50 000 years of 
human history have witnessed major upheavals, such that 
much of the geographic information about the first human 
migrations has been overwritten by subsequent population 
movements. However, the data also often contradict mod- 
els of population replacement: when two distinct popula- 
tion groups come together during demographic expansions 
the result is often genetic admixture rather than complete 
replacement. This suggests that new types of models - with 
admixture at their center - are necessary for describing 
key aspects of human history ([14-16] for early examples of 
admixture models). 

In the second section of the paper we sketch out a way 
forward for data-driven construction of these models. We 
specifically highlight the potential of ancient DNA studies 
of individuals from archaeologically important cultures. 
Such studies in principle provide a source of information 
about history that bypasses some fundamental ambigui- 
ties in the interpretation of genetic, archaeological, or 
anthropological evidence alone. We discuss several poten- 
tial applications of this technology to outstanding ques- 
tions in human history. 

There are a number of excellent articles that have 
reviewed the literature on genome-wide studies of human 
history [17-23]. Our focus here is not on providing a 
comprehensive review but instead on highlighting promis- 
ing directions for future research. 

Reevaluation of the 'serial founder effect' model 

We begin with an illustration of some of the ambiguities in 
interpretation of genetic data in the context of the ‘serial 
founder effect’ model. This model, initially proposed by 
Harpending, Eller, and Rogers [24,25], gained popularity 
with the publication of two papers [7,8] that observed that 
heterozygosity (see Glossary) declines approximately line- 
arly with geographic distance from Africa. This pattern, 
initially identified in genome-wide microsatellite geno- 
types from around 50 worldwide human populations, 
was subsequently confirmed based on patterns of haplo- 
type diversity in large single nucleotide polymorphism 
(SNP) datasets [26], 


The observation of a smooth decline in human diversi- 
ty with increasing geographic distance from Africa was 
interpreted as a window into demographic events deep in 
our species’ past. Specifically, in the serial founder effect 
and related models (Figure 1A) [12], the peopling of the 
globe proceeded by an iterative process in which small 
bands of individuals pushed into unoccupied territory, 
experienced population expansions, and subsequently 
gave rise to new small bands of individuals who then 
pushed further into unoccupied territory. This model has 
two important features: a large number of expansions 
into new territory by small groups of individuals (and 
concurrent bottlenecks), and little subsequent migration 
[7-12], As Prugnolle et al. [7] wrote: ‘what is clear... is that 
[this] pattern of constant loss of genetic diversity along 
colonisation routes could only have arisen through suc- 
cessive bottlenecks of small amplitude as the range of our 
species increased... The pattern we observe also suggests 
that subsequent migration was limited or at least very 
localised’. This qualitative conclusion was followed by 
more quantitative ones; in an explicit fit of the serial 
founder effect model to data, Deshpande et al. [9] con- 
cluded that ‘incorporation into existing models of ex- 
change between neighbouring populations is essential, 
but at a very low rate’. 

This model has been influential in many fields. If it is 
correct, then the difficult problem of identifying the geo- 
graphic origin of all modern humans is reduced to the 
simpler problem of finding the geographic region where 
people have the most genetic diversity [8]. This idea has 
been used extensively in discussions of human origins (e.g., 
[27,28]). The model also provides a null hypothesis of 
limited migration against which alternatives can be tested 
[29,30]. Outside genetics, the serial founder effect model 
has been used as a framework to interpret data from 
linguistics [31], physical anthropology [32-35], material 
culture [36], and economics [37]. 

The serial founder effect model, however, is only one of 
many models that can produce qualitatively similar pat- 
terns of genetic diversity (e.g., [38,39]). Producing the 
empirical pattern of a smooth decline in diversity with 
distance from Africa simply requires that the average time 
to the most recent common ancestor between two chromo- 
somes in a population depend on the distance of that 
population from Africa [12,39]. 

To illustrate this point, we constructed two models 
that produce smooth declines in diversity with distance 
from Africa as in the serial founder effect model. Howev- 
er, these models differ qualitatively from the serial 
founder model in that this pattern is driven by admixture 
rather than bottlenecks in the distant past. These are 
(i) a model with two severe bottlenecks and extensive 
subsequent population admixture (Figure IB) and (ii) a 
model without bottlenecks but with archaic admixture 
(from an anciently diverged population such as Nean- 
derthals) as well as extensive recent population admix- 
ture (Figure 1C). Details of the model specifications 
and simulation parameters are in the Supplementary 
Material online. Using ms [40], we simulated 1000 
regions of 100 kb under the serial founder effect model 
and each of these two models, and plotted the average 
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Figure 1 . A negative correlation between heterozygosity and geographic distance from a source population can be generated by qualitatively different, historically plausible 
demographic models. We simulated genetic data under different demographic models and calculated the average heterozygosity in each simulated population. (A) 
Schematic of a serial founder effect model. (B) Schematic of a demographic model with two bottlenecks and extensive admixture. (C) Schematic of a demographic model 
with no bottlenecks and extensive admixture. (D-F) Average heterozygosity in each population simulated under the demographic models in A, B, and C respectively. Each 
point represents a population, ordered along the x-axis according as in A. 


heterozygosity in each population. In all three cases 
(Figure 1D-F) we recapitulated a smooth linear decline 
in heterozygosity with distance from a reference popula- 
tion (in the serial founder model, this reference population 
can be interpreted as the source of the expansion, but 
there is no analogous interpretation of this population 
in the other models). Qualitative patterns of linkage 
disequilibrium were also similar in all scenarios 
(Figure S3). 

These simulations show that the main observation that 
has been marshaled in support of the serial founder effect 
model is also consistent with very different histories (see 
also [29,38,39,41]). Specifically, in the absence of addition- 
al data, the smooth linear decline in heterozygosity away 
from Africa could represent a signal of many population 
bottlenecks during the initial out-of-Africa expansion tens 
of thousands of years ago, or it could represent a signal of 
extensive population mixture within the past few thousand 
years (or, of course, a combination of these or many other 
models that we have not considered). Because the data are 
compatible with both, arguing for one over the other 
involves a subjective determination of which class of model 
is more likely a priori. Perhaps the most important issue 
affecting this determination is how important migration 
has been over the past 50 000 years of human history. How 
representative are populations today of the populations 
that lived in the same locations after the out-of-Africa 
expansion? 


Empirical data have shown that the current inhabitants 
of a region are often poor representatives of the 
populations that lived there in the distant past 

The answer to the question posed above has been the 
subject of considerable research over the past several 
years. In our opinion one finding is already clear: long- 
range migration and concomitant population replacement 
or admixture have occurred often enough in recent human 
history that the present-day inhabitants of many places in 
the world are rarely related in a simple manner to the more 
ancient peoples of the same region (Figure 2). 

The Americas over the past 500 years present one recent 
example. The Americas experienced massive demographic 
change after the arrival of Europeans and Africans, such 
that most of the ancestry of the Americas is not derived 
from the Native Americans who were the sole inhabitants 
of the region half a millennium ago [42-47] . Another recent 
example is Australia, where European migration over the 
past couple of hundred years is the main source of the 
genetic material in the region today [48] . 

An example from further back in time comes from the 
present-day hunter-gatherer and pastoralist populations 
of Siberia, which are often treated as surrogates for the 
populations that crossed the Bering land bridge to people 
the Americas beginning more than 15 000 years ago 
(e.g., [47,49-51]). DNA sequences from two individuals 
who lived in the Lake Baikal region of Siberia ~24 000 
years before the present and ~17 000 years before the 
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Figure 2. A rough guide to genetically documented population movements in the history of anatomically modern humans. 


present, respectively, have indicated that this assumption 
is unfounded. These two ancient individuals (from periods 
before the time the Americas were thought to be peopled) 
are more closely related to present-day Native Americans 
than are present-day Siberians [52] . They appear to have 
been members of an ‘Ancient North Eurasian’ population 
that no longer exists in unmixed form, but that admixed 
substantially with the ancestors of both present-day Eur- 
opeans and Native Americans [52-54] . By contrast, most of 
the present-day indigenous populations of Siberia are more 
closely related to populations currently living in East Asia, 
indicating that the present-day indigenous population of 
Siberia is descended in large part from populations that 
arrived in the region after the end of the last ice age [52] . 

Ancient DNA studies have also made considerable prog- 
ress in resolving the major debate about whether the 
arrival of agriculture in Europe involved the spread of 
people or technology [53,55-63], A key observation comes 
from genome-wide data from hunter-gatherer and agricul- 
turalist populations that lived around 5000 years ago in 
present-day Sweden [62,63]. The farmer population 
appears most genetically similar to southern Europeans 
today, whereas the hunter-gatherers are more similar to 
northern Europeans (and, notably in light of the discussion 
of the serial founder effect model above, have levels of 
genetic diversity lower than in the modern European and 
East Asian populations [63] ). Thus, at least in Scandinavia, 
the spread of agriculture was accompanied by the spread of 
people. The outcome of this spread of people was not 
population replacement, but rather admixture, such that 
European populations today trace some of their ancestry to 
both ancestral populations [53,62]. Mitochondrial DNA 
(mtDNA) studies suggest that this dynamic is characteris- 
tic of the arrival of agriculture throughout Europe [55-58], 

The arrival of farmers was not the end of prehistoric 
migration in Europe (even putting aside discussion of 
migrations since the invention of writing; see [64-67]). 
In a single geographic region in present day Germany, 
mtDNA has been obtained from hundreds of human 
samples from archaeological cultures ranging from the 
early Neolithic to the Bronze Age [56]. There is an appar- 
ent genetic discontinuity between people of early and late 


Neolithic cultures. In particular, people of late Neolithic 
cultures bear more relatedness to the present-day popu- 
lations of Eastern Europe and Russia than do people of 
early Neolithic cultures. Thus, demographic turnover has 
apparently occurred at least twice over the course of the 
past 8000 years of European prehistory. This makes 
inferences about the inhabitants of Europe tens of thou- 
sands of years ago based on the locations of people today 
unreliable. 

Evidence of major population mixture in the past sev- 
eral thousand years has also accumulated in parts of the 
world where no ancient DNA is (yet) available. A series of 
studies have detected and dated admixture in a range of 
human populations. These studies make it clear that there 
are multiple distinct sources of ancestry in most popula- 
tions. We caution, however, that without ancient DNA it is 
not possible to be confident which populations lived in a 
region before the admixture: 

India. Here nearly all people today are admixed be- 
tween two distinct groups, one most closely related to 
present-day Europeans, Central Asians, and Near East- 
erners, and one most closely related to isolated populations 
in the Andaman islands [68]. Much of this admixture 
occurred within the past 4000 years [69]. 

North Africa. In this region nearly all people today 
descend from admixture between populations related to 
those present today in western Africa and in the Near East 
[70-72]. Some of this mixture can be dated to within the 
past few thousand years [65], indicating that much of the 
ancestry from these populations does not descend continu- 
ously from the Stone Age peoples of North Africa. 

Sub-Saharan Africa. Genetic studies have documented 
multiple examples of populations with ancestry from dis- 
parate sources. Many populations across sub-Saharan 
Africa trace some fraction of their ancestry to admixture 
in the past several thousand years with populations relat- 
ed to those in western Africa [65] . Populations in eastern 
and southern Africa have been influenced by gene flow 
from west Eurasian-related populations in the past 3000 
years [73,74], In Madagascar all populations derive ap- 
proximately half of their ancestry from populations related 
to those currently living in southeast Asia [75], 
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Beyond case studies, several groups have applied tests 
for admixture to diverse populations from around the globe 
and found that nearly all populations show evidence of 
admixture [54,65,76] . To illustrate the degree to which this 
admixture involved populations whose closest genetic rela- 
tives today are geographically distant from each other, in 
Figure 3 we present an analysis based on combining data 
for 103 worldwide populations from several sources [26 — 
28,73,77,78] and running a simple three-population test for 
admixture [68] on all populations (for further details see 
the Supplementary Material online). It is important to 
recognize that this test is not perfectly sensitive - for 
example, large amounts of genetic drift in the history of 
a population can mask the statistical signal of an admix- 
ture event - but when the test does detect a signal it 
provides incontrovertible evidence of gene flow into the 
ancestors of the population [54] . For all populations with 
statistically significant evidence of admixture, we identi- 
fied the present-day populations that share the most ge- 
netic drift with the admixing ancestral populations. 


Figure 3 shows the geographic locations of all the 
populations, together with the locations of the best pres- 
ent-day proxies of their ancestral populations. Admixture 
between populations related to ones that are now geo- 
graphically distant is evident in most populations of the 
world. For example, Native American-related ancestry is 
present throughout Europe [54], likely reflecting the ge- 
netic input from the Ancient Northern Eurasian popula- 
tion related to Upper Paleolithic Siberians [53,54] both 
into the Americas (most likely before 15 000 years ago) and 
into Europe [52,53]. In addition, ancestry from a popula- 
tion related to those living in the Near East is found in 
Cambodia [30], likely due to mixture from an ancestral 
South Asian population that was itself an admixed popu- 
lation containing ancestry related to present-day Near 
Easterners [65]. The test we use as the basis for 
Figure 3 detects only one signal of admixture per popula- 
tion, and cannot detect complete population replacement. 
The true population history is thus likely to have been 
even more complex. 
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Figure 3. A birds-eye view of admixture in human populations. We performed a three-population test for admixture (Reich et al. [68]) on 103 worldwide populations. Circles 
show the approximate current geographic locations of all tested populations (except for populations in the Americas, which are not plotted for ease of display). Filled circles 
represent populations identified as admixed, and the colors represent the current geographic labels of the inferred admixing populations. Empty circles represent 
populations with no statistically significant evidence for admixture in this test (however, in many of these instances, other tests do detect evidence of mixture [65,76]). 
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These examples show that the populations in a given 
region today are rarely descended in a simple manner from 
the inhabitants in the distant past. This provides further 
evidence that the serial founder effect model is no longer a 
reasonable null model for the relationship between pres- 
ent-day populations and their ancestors. Instead, dines in 
genetic diversity observed in data may often be better 
modeled as outcomes of admixture (as in Figures 1B,C) 
rather than a series of bottlenecks. 

Ancient DNA: a transformative source of information 
about the past 

The types of models that are most useful for making sense 
of human history are those that specify geographic as well 
as temporal information; that is, those that make state- 
ments of the form ‘a population from location X moved to 
location Y during time-period Z’. For example, one might 
wish to test whether the first pastoralists in southern 
Africa arrived as migrants from eastern Africa after 
~2500 years ago (the time when evidence of pastoralism 
in southern Africa appears in the archaeological record), 
against the alternative hypothesis that a pastoralist life- 
style was adopted by indigenous people who learned it 
through cultural transmission. Testing this hypothesis 
using DNA from individuals today involves assuming that 
populations in southern and eastern Africa today are 
representative of the populations in southern and eastern 
Africa at the times of interest [74,79,80], This assumption 
is often difficult or impossible to test. 

Studies of DNA from ancient human remains offer a way 
around this limitation. Instead of studying the past by the 
traces it has left in present-day people - which is problem- 
atic because human individuals and even whole popula- 
tions are capable of migrating hundreds or thousands of 
kilometers in a lifetime - ancient DNA offers the ability to 
analyze the genetic patterns that existed at a particular 
time and geographical location. This allows direct infer- 
ence about the relationships of historical populations to 
each other and to populations living today. Box 1 lists some 
surprising findings about human history based on ancient 
DNA, which would have been difficult or impossible to 
obtain without this source of information. 

Ancient DNA results are so regularly surprising that 
almost any measurement is interesting: new historical 
discoveries have been made in virtually every ancient 
DNA study that has been carried out. The reason why 
ancient DNA studies are so informative is that the tech- 
nology provides a tool to measure quantities that were 
previously unmeasurable. In this sense, the value of an- 
cient DNA technology as a window into ancient migrations 
is analogous to the 17th century invention of the light 
microscope as a window into the world of microbes and 
cells. 

Scientific opportunities for ancient DIMA studies 

Moving forward, ancient DNA studies afford major oppor- 
tunities in two areas: studies of population history and 
studies of natural selection. 

The literature on ancient mtDNA contains several 
promising study designs. For example, one might sample 
multiple individuals from different archaeological cultures 


Box 1. Surprising findings about human history illuminated 

by ancient DNA 

• There was archaic Neanderthal gene flow into anatomically 
modern humans outside Africa 37 000-85 000 years ago [133]. 
This gene flow contributed on the order of 2% of the genetic 
ancestry of non-Africans [106], The first evidence of ancient 
admixture in non-Africans was suggested based on analysis of 
present-day humans, without access to any ancient DNA [134]. 
However, the consensus about whether admixture occurred only 
changed after ancient DNA evidence showed that the deeply 
diverged segments in present-day non-Africans are related to 
Neanderthals [106]. 

• A previously unknown archaic population that was neither 
Neanderthal nor modern human was present in Siberia before 
50 000 years ago [135,136]. The discovery of the 'Denisovans' 
shows how ancient DNA can reveal 'genomes in search of a fossil' 

- populations whose existence was not expected based on the 
archaeological or skeletal record. 

• There was gene flow from a population related to the Denisovans 
into the ancestors of present-day aboriginal people from New 
Guinea, Australia, and the Philippines [86,107,108,136,137]. These 
populations all live in Oceania, far from where the Denisovan 
bone was found in Siberia, a finding that again would not have 
been expected in the absence of ancient DNA data. 

• Native Americans are admixed between an Ancient North 
Eurasian population and a population related to present-day East 
Asians as a result of events before the diversification of Native 
American populations in the New World [52]. 

• There were at least two important migration events into Europe 
over the past 9000 years [53,55-60,62], 

• Populations in the Americas have experienced multiple episodes 
of turnover, such that an individual who lived in Greenland 
around 4000 years ago is more closely related to populations 
currently in Siberia than to present-day Greenland Inuit popula- 
tions [109], and an individual who lived in North America around 
12 000 years ago is more closely related to present-day Native 
South American populations than to present-day Native North 
American populations [138]. 

at a single time-point [60]. Such a ‘horizontal time slice’ 
allows a snapshot of population structure over a broad 
geographic region, which can then be compared to the 
relatively complete picture of population structure today. 
An alternative study design (similar to that used in [56]) is 
to take a single geographic location and sample individuals 
from multiple time-points. This ‘vertical time slice’ allows 
direct quantification of changes in population composition 
over time. 

mtDNA studies have major drawbacks compared with 
analysis ofthe whole genome, however [81]. First, mtDNA is 
inherited maternally, and thus does not capture any infor- 
mation about the history of males (which may differ from 
that of females owing to sex-biased demographic processes). 
More importantly, a study of a single locus (or two loci if the 
Y chromosome is included) has less statistical resolution for 
studies ofhistory than do studies ofthe nuclear genome. The 
reason is that whole-genome studies of an individual contain 
information about thousands of that individual’s ancestors, 
and not only information about a single ancestral lineage. It 
is thus important that the more advanced study designs 
based on mtDNA be combined with analysis of the more 
informative autosomal DNA. 

What outstanding questions in population history can be 
addressed with these designs? One question is whether 
changes in populations over time are typically gradual - 
owing to consistent, low-level gene flow between neighboring 
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populations - or punctate, with migration events rapidly 
altering the genetic composition of a region. One line of work 
on modeling human history explicitly assumes the latter 
[30,54,65,76,82], If this assumption is unfounded, however, 
other models that accommodate continuous gene flow (e.g., 
[83,84]) maybe more appropriate. In principle, distinguishing 
between these possibilities with a time-series of ancient DNA 
is straightforward; in Figure 4A we show how the predictions 
differ. 

Ancient DNA is also a promising way to address ongoing 
historical debates about the origins of different popula- 
tions. Do linguistic isolates such as the Basques in Europe 
have different mixtures of ancestries than their neighbors 
[85]? When did the west Eurasian-related population(s) 
that admixed with most Indian populations first appear in 
South Asia [68]? Did the first modern human inhabitants of 
East Asia descend from an earlier out-of-Africa migration 
than the populations living in East Asia today [86,87]? 
Answering these questions will require a time-series of 
snapshots of human genetic structure, combining the two 
types of study design mentioned above. If experience is a 
guide, this type of information will uncover additional 
unexpected aspects of human history. 

The admixture and population replacements identified 
by ancient DNA also have implications for studies of 
natural selection. It is often assumed that populations 
have been in their current geographic locations long 
enough to adapt to their local conditions. This is explicit 
in approaches that test for correlations between environ- 
mental variables and allele frequencies (e.g., [88-90]) and 
implicit in studies that interpret selected loci in terms of 
the current locations of populations (e.g., [91]). 

To what extent are the geographic distributions of select- 
ed alleles today indicative of the geographic distributions of 
selective pressures? (The answer to this question may de- 
pend on the selective pressure in question). In individual 
cases, these two distributions are highly correlated; the 


classic example is the correlation between malaria incidence 
and disorders of hemoglobin [92], For other cases, the corre- 
lation is imperfect. For example, alleles causing light skin 
pigmentation are at high frequency in northern Africa [93] . 
If the selective pressure causing light skin is indeed a 
relative lack of ultraviolet radiation [94], it seems reason- 
able to expect that this pressure has not affected people 
living around the Sahara desert. It is thus likely that the 
high frequency of alleles causing light skin pigmentation in 
north Africa was caused by the arrival of lightly pigmented 
people [70]. 

More generally, it has been observed that the geograph- 
ic distributions of alleles under natural selection in 
humans tend to match the distribution of neutral popula- 
tion structure rather than any obvious geographic varia- 
tion in selection pressures [95,96]. One factor contributing 
to this observation may be that selection pressures on 
individual loci are relatively weak owing to the quantita- 
tive nature of phenotypes [97-100] . Another contributing 
factor may be that population movements over the past 
several thousand years have to some extent decoupled the 
geographic distributions of selected alleles from the geo- 
graphic distributions of selective pressures. From the point 
of view of an individual allele, the movement of populations 
acts as a form of fluctuating selection; as populations move 
from environment to environment, the selection coefficient 
on an allele may change in both sign and magnitude 
(assuming a fixed selection coefficient in a given environ- 
ment). This means that the environment that dominated 
the allele frequency trajectory may not be the environment 
that the allele is found in today. 

Ancient DNA is also a potentially transformative tool for 
understanding human adaptation more generally. Nearly 
all methods for detecting positive selection at the genetic 
level are based around the principle that a selected allele 
changes frequency more quickly than a neutral locus. 
Tracking the trajectories of allele frequencies over time 
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Figure 4. Time-series allow tests of different models of human history. (A) Different scenarios for the dynamics of gene flow. We show idealized time-series of admixture 
proportions in a population under models of continuous gene flow and discrete admixture. (B) Selection in the presence of admixture. We show idealized time-series of the 
allele frequencies at a selected allele and admixture proportions in a population. Note that despite selection the allele does not show any net change in frequency. 
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allows direct access to this information, and makes it 
possible to infer precisely when in time (and with more 
certainty where geographically) genetic changes began to 
arise [101], In fact, in the presence of admixture it is simple 
to create scenarios where the frequency trajectory of an 
allele over time allows one to identify selection despite 
there being no net change in allele frequency (Figure 4B). 
To date, studies of selection over time have been limited 
either to small sample sizes [53,102] or to small numbers of 
sites [101,103-105], However, whole-genome technologies 
should make it possible to interrogate many thousands of 
phenotypically relevant variants simultaneously. 

Taming the wild west of ancient DIMA 

Realizing the potential of ancient DNA studies will require 
systematic approaches. However, ancient DNA studies 
today are often more spectacular than systematic. The 
paradigmatic approach to ancient DNA research in the 
era of high-throughput sequencing involves identifying a 
‘golden’ archaeological sample that yields usable DNA, and 
obtaining a complete or partial genome sequence from it 
(e.g., [52,59,106-110]). To some extent the discovery of 
analyzable samples has been the main driver of the scien- 
tific questions asked. However, the future of ancient DNA 
research requires a more systematic approach: hypothesis- 
driven sampling across time and space, and analysis of 
much larger sample-sizes. 

Ancient DNA research has three major experimental 
challenges and two computational challenges that jointly 
need to be addressed by researchers who wish to access this 
transformative technology. 

The first experimental challenge is the danger of contam- 
ination from archaeologists or laboratory researchers who 
handle the sample. This is a recurring problem [111-118], 
Although experimental guidelines (e.g., [116,119,120]) as 
well as laboratory methods for reducing the possibility of 
contamination [106] and empirically assessing the authen- 
ticity of ancient DNA sequences [121] have greatly improved 
the situation, in practice this concern will never disappear. 
The importance of controlling for the possibility of contami- 
nation is the single most important reason why, to date, 
convincing ancient DNA research has been dominated by a 
small number of specialist laboratories with the expertise 
and facilities to control contamination. All laboratories that 
carry out this type of work will need to maintain the same 
level of vigilance currently maintained at specialist labora- 
tories. To reduce contamination at the source, it is also 
important that archaeologists excavating new samples be- 
come trained in handing samples in ways that minimize 
contamination. Valuable measures include wearing sterile 
gloves and a protective suit while excavating remains, not 
washing the remains, and immediately placing the remains 
in a sterile plastic bag and refrigerating them before ship- 
ment to an ancient DNA laboratory. 

A second experimental challenge is the difficulty of 
identifying human remains that contain preserved DNA. 
This often requires screening dozens of individuals from 
multiple sites, with success rates in DNA extraction 
depending strongly on the conditions experienced by the 
sample: its age, temperature, humidity, acidity, the part of 
the body from which it derives, the speed with which it 


dried after death, whether or not it is from a sample that 
was rapidly defleshed, and other factors that are not 
currently understood. Modern methods have increased 
the efficiency of DNA extraction by capturing the short 
molecules that make up the great majority of any ancient 
sample [122,123]. In addition, improved library prepara- 
tion methods have increased the fraction of molecules in an 
extract that are amenable to sequencing [107,110], Never- 
theless, it is still the case that there is great variability in 
success. The secret of successful ancient DNA research is 
not luck, but hard work: screening many carefully chosen 
and prepared samples until a subset are identified that 
perform well. 

A third experimental challenge is the difficulty of find- 
ing a sample that has a sufficiently high proportion of DNA 
from the bone itself to be economically analyzed. Concrete- 
ly, the challenge is that, for many ancient DNA extracts, 
the proportion of endogenous DNA is very low, on the order 
of 1% or less. For example, a recent study on a ~40 000 year 
old sample from the Tianyuan site of Northern China 
worked successfully with a sample that had an endogenous 
DNA proportion of 0.02% [110], For such samples, impor- 
tant as they are, it is not economical to simply carry out 
brute-force sequencing of random genomic fragments and 
expect to obtain sufficient coverage to make meaningful 
inferences. 

The challenges of ancient DNA research do not end once 
the sample preparation is complete because the datasets 
that are generated pose formidable computational and 
analytical challenges. The first challenge is that, once 
useable samples are obtained and sequenced, the data 
need to be processed. Ancient DNA laboratories, which 
traditionally have their strongest expertise in archaeology, 
physical anthropology, or biochemistry, often lack the 
bioinformatics expertise, data-processing power, and da- 
ta-storage solutions necessary to handle the millions or 
even billions of sequences that are generated by modern 
ancient DNA studies. Moreover, ancient DNA data also 
require tailored bioinformatics tools for handling the short 
sequences that are characteristic of old samples [124]; one 
cannot simply use existing tools such as SAMtools [125] or 
the Genome Analysis Toolkit [126] with the default set- 
tings. For example, the sequenced DNA fragments are 
usually short and degraded, and are expected to have 
C— >T and G^A errors at the ends of the molecules due 
to cytosine deamination. The data additionally need to be 
assessed computationally for evidence of contamination, 
for example by checking the rate of molecules that do not 
map to the mtDNA consensus sequence obtained for the 
sample [110]. If a sample is determined to be contaminat- 
ed, the characteristic ancient DNA errors can be leveraged 
to reduce the level of contamination (assuming that the 
contamination is not old) by restricting the analysis to 
sequences that contain differences from the human refer- 
ence genome sequence that are hallmarks of errors due to 
ancient DNA degradation [127,128], 

A second computational challenge for ancient DNA 
research is that, once the data are processed and a sample 
is determined as likely to be authentic, the data also need 
to be analyzed using statistical methods that infer popula- 
tion history. Many methods that have been developed to 
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make inferences about population relationships are not 
substantially biased by the fact that ancient DNA is old 
and error-prone [52,59,106-110]. However, these methods 
must still be implemented carefully to produce meaningful 
results. In particular, the methods need to handle the 
complications of ancient DNA, such as the fact that ancient 
DNA data often has low levels of redundancy, and there- 
fore that unambiguous determination of genotypes at each 
position in the genome is often unreliable. 

At present few laboratories have the experimental and 
computational expertise to address all these challenges 
simultaneously. As a result, ancient DNA research has 
been dominated by a few vertically integrated and well- 
funded laboratories that combine all these skills under the 
same roof. The high barriers to entry have meant that the 
full potential of ancient DNA has been largely untapped. In 
what follows we sketch out a way forward that we expect 
will make ancient DNA analysis more accessible to the 
broad research community, including archaeologists. 

Democratization of ancient DNA technology 

The usual paradigm in ancient DNA analysis of the nuclear 
genome has been to identify a sample that has a high 
enough proportion of DNA to deeply sequence. Such golden 
samples are rare, and are typically identified only after 
laborious screening of many dozens of samples that have 
low proportions of human DNA. Once a sample is found 
that has an appreciable proportion of human DNA, it is 
typically sequenced to as high a coverage as possible 
(sometimes the limited number of starting molecules in 
many ancient DNA libraries is the main factor limiting 
sequencing depth). All of this is very expensive, and has 
been an important barrier to entry into ancient DNA work 
for less well funded laboratories. 

Whole-genome sequences, however, are not required 
for most historical inferences. In a paper that was impor- 
tant not only for what it showed about history but also for 
what it showed about the quantity of information that can 
be extracted from small amounts of data, Skoglund et al. 
[62] found that even low levels of genome coverage per 
sample - only 1-5% of genomic bases were covered in 
their case - were sufficient to support profound historical 
inferences. 

A promising way to make ancient DNA analysis acces- 
sible to a much larger number of laboratories is to use 
targeted capture approaches that enrich a sample for 
human DNA. In one approach to target enrichment 
[129] it was shown that it is possible to take RNA baits 
obtained by transcribing a whole human genome and 


hybridize them in solution to an ancient DNA library, thus 
increasing the fraction of human DNA to be sequenced by 
more than an order of magnitude. Another group [110] 
showed that it is possible to synthesize oligonucleotide 
DNA baits targeted at a specified subset of the genome - 
the entire mtDNA sequence, or the coding sequences of all 
genes - and to hybridize to ancient DNA libraries in solution 
to enrich a library for molecules from the targeted subset of 
the genome. This strategy was used to obtain a high-quality 
mitochondrial genome sequence from a ~400 000 year old 
archaic human [127], as well as approximately twofold 
redundant coverage of chromosome 21 from the ~40 000 
year old Tianyuan sample from northern China [110]. 

We ourselves are particularly enthusiastic about the 
possibility of adapting a technology such as that described 
above to enrich human samples for panels of several 
hundred thousand SNPs that have already been genotyped 
on present-day samples. This is a sufficient number of 
SNPs that it would allow for high-resolution analysis of 
how an ancient sample relates to present-day as well as 
other ancient samples. The strategy has two potential 
advantages. First, through enrichment, it allows analysis 
of samples with much less than 10% human DNA, which 
are not economical for whole-genome sequencing studies. 
Second, assuming that it works, it requires about two 
orders of magnitude less sequencing per sample to saturate 
all its targets (Table 1). We caution that there are some 
questions - for example, estimation of population diver- 
gence times - that rely on the identification of sample- 
specific mutations and may be better addressed with 
whole-genome sequencing data. Nevertheless, we believe 
that a capture experiment can answer the substantial 
majority of questions related to population history or 
natural selection that are addressable with genetic data, 
while allowing larger numbers of samples to be analyzed 
for the same cost. 

An important goal for the coming years should be to 
make ancient DNA a tool that will become fully accessible 
not only to smaller laboratories but also to archaeologists. 
In this regard it is important to encourage interaction and 
collaboration between the genetics and archaeology com- 
munities to develop standards for the interpretation and 
use of genetic data in answering questions relevant to 
archaeology. 

Members of the archaeology community are already 
sophisticated consumers of other scientific technologies 
for analyzing ancient biological remains such as 14 C analysis 
(for date estimation) and stable isotope analysis (for making 
inferences about diet). Currently, when archaeologists have 


Table 1. Comparison of two strategies for ancient DNA studies of history 



Whole-genome 

sequencing 

Capture and sequencing 
of 300 000 SNPs 

Number of samples that need to be screened 3 

i 

1/10 

Amount of sequencing that needs to be performed 3 

i 

1/100 

High-resolution inference of population relationships 

Yes 

Yes 

Allows studies of selection 

Yes 

Yes 

Variants specific to sample can be discovered 

Yes 

No 

Works for samples with a low % of human DNA 

No 

Yes 


a These are rough estimates based on discussion with colleagues. The estimated amount of work required for capturing 300 000 SNPs is expressed as a fraction of that 
required for whole-genome sequencing. 
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Box 2. Outstanding Questions 

• What was the genetic makeup of humans in eastern Africa before 
the arrival of Bantu speakers? The population genetic structure of 
sub-Saharan Africa has been transformed by the expansion of 
Bantu-speaking agriculturalists over the past 3000 years. As a 
concrete example, the population structure of eastern Africa before 
this expansion remains controversial [139,140], Ancient DNA from 
samples before the Bantu expansion could settle this debate. 
Although the climate in much, but not all, of sub-Saharan Africa is 
less favorable for preservation of DNA than that of northern 
Eurasia, it is not clear whether enough African samples have been 
tested to determine whether or not ancient DNA analysis works. 
Moreover, the timescale of the Bantu expansion is only several 
thousand years rather than tens of thousands of years. In light of 
continuing technological progress in ancient DNA extraction 
(including the successful extraction of DNA from a ~400 000 year 
old sample in Spain [127]), we are hopeful that ancient DNA studies 
of some African samples within the past few thousand years may 
become possible. 

• When were the geographic distributions of alleles under positive 
selection established? Loci under positive selection in humans have 
allele frequencies with characteristic geographic patterns, which 
have been interpreted as 'west Eurasian', 'east Asian', and 'non- 
African' selective sweeps [95], Were these patterns established tens 
of thousands of years ago during the initial out-of-Africa by modern 


humans? Or were they established after later population move- 
ments? Resolving these questions will become possible once 
substantial numbers of ancient samples are genotyped. Work in 
this area is only now beginning [101]. 

• Was the spread of Indo-European languages caused by large-scale 
migrations of people or did language shifts occur without extensive 
population replacement? For times before the invention of writing, 
it will never be possible to directly relate the language that people 
spoke to their remains. Nevertheless, genetic data may still be 
informative about the historical events that accompany linguistic 
expansions. The origins of the Indo-European languages - spoken 
500 years ago across Europe and West, Central, and South Asia and 
even more widely distributed today - is a particularly important 
question [141,142]. These languages likely spread across Eurasia 
and diversified within the past 5-10 000 years. Was this spread of 
languages caused by a large-scale movement of people? It may 
now be possible to obtain ancient DNA from cultures known to 
have spoken Indo-European languages - for example, FHittites and 
Tocharians - and to compare these populations with their 
neighbors to determine whether they harbor a genetic signature 
specific to the Indo-European speakers. It may also be possible to 
search for the spread of (currently hypothetical) Indo-European 
genetic signatures in Europe and India at the times when various 
hypotheses have suggested that they may have arrived. 


a sample that they wish to analyze, they send it to specialist 
(sometimes commercial) laboratories, which then provide a 
carbon date and/or an isotope analysis, together with a 
report giving interpretation. We envisage a future in which 
ancient DNA analysis could similarly become available as a 
service to archaeologists. Concretely, archaeologists would 
be able to send skeletal material to a specialist laboratory for 
analysis, where it could be tested for ancient DNA. If ancient 
DNA is detected, a whole mitochondrial sequence could be 
produced and a sex determined. For ancient samples with 
evidence of uncontaminated DNA, whole-genome data could 
be produced. The relationships of the mtDNA to others that 
have been previously generated could be summarized in the 
form of a tree, and the affinities of the population could be 
summarized in the form of a method such as principal 
component analysis. 

To make such a future possible it will be necessary to 
build up a database of present-day human and ancient 
DNA samples all genotyped at the same set of SNPs to 
which any new sample can be compared. In addition, it will 
be necessary to write software to compare automatically a 
new sample to samples in the database [54,130-132], and 
this could be used as the basis for producing a report for 
archaeologists on how the sample relates to diverse pres- 
ent-day humans and to other ancient samples. Because of 
the subtleties of interpreting genetic data, it will be nec- 
essary for archaeologists to work collaboratively with 
population geneticists to provide in-depth interpretation 
of the results of such studies. Such a report would, howev- 
er, be a useful tool for giving archaeologists a first impres- 
sion of the biological ancestry of their samples, including 
information on the sex and plausibility of geographic 
origin. 

We particularly wish to highlight the potential of an- 
cient DNA as a direct tool for elucidating the population 
structure and patterns of relatedness within a particular 
archaeological site. We mention here two types of analysis 
of particular interest: (i) correlating genetic findings - sex, 


relatedness and ancestry - to archaeological information 
derived from grave goods, status symbols, and other 
objects; and (ii) the identification of outlier individuals 
who are unusual in terms of their ancestry relative to 
others at nearby sites. 

Concluding remarks and future directions 

We have argued that it will likely to be fruitful to reexam- 
ine many aspects of human population history and natural 
selection from a perspective in which population move- 
ments and admixture play a central role. Moreover, we 
have shown that ancient DNA has emerged as transfor- 
mative tool for addressing questions about human history 
- it is not merely an interesting side-show in terms of 
insights that it can bring to understanding of the human 
past, but a tremendous leap forward beyond what has been 
possible through analysis of DNA from present-day 
humans. What has already been discovered about human 
history from whole-genome analysis and ancient DNA is 
only the tip of the iceberg, and we expect that the coming 
decade will bring even more important discoveries. We 
conclude by listing several questions that are likely to 
be addressable (Box 2). 
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